synthetic dataset
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization
Zhang, Simon, DeMilt, Ryan P., Jin, Kun, Xia, Cathy H.
Out-of-distribution (OoD) generalization occurs when representation learning encounters a distribution shift. This occurs frequently in practice when training and testing data come from different environments. Covariate shift is a type of distribution shift that occurs only in the input data, while the concept distribution stays invariant. We propose RIA - Regularization for Invariance with Adversarial training, a new method for OoD generalization under convariate shift. Motivated by an analogy to $Q$-learning, it performs an adversarial exploration for counterfactual data environments. These new environments are induced by adversarial label invariant data augmentations that prevent a collapse to an in-distribution trained learner. It works with many existing OoD generalization methods for covariate shift that can be formulated as constrained optimization problems. We develop an alternating gradient descent-ascent algorithm to solve the problem in the context of causally generated graph data, and perform extensive experiments on OoD graph classification for various kinds of synthetic and natural distribution shifts. We demonstrate that our method can achieve high accuracy compared with OoD baselines.
- North America > United States (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
AutoStan: Autonomous Bayesian Model Improvement via Predictive Feedback
We present AutoStan, a framework in which a command-line interface (CLI) coding agent autonomously builds and iteratively improves Bayesian models written in Stan. The agent operates in a loop, writing a Stan model file, executing MCMC sampling, then deciding whether to keep or revert each change based on two complementary feedback signals: the negative log predictive density (NLPD) on held-out data and the sampler's own diagnostics (divergences, R-hat, effective sample size). We evaluate AutoStan on five datasets with diverse modeling structures. On a synthetic regression dataset with outliers, the agent progresses from naive linear regression to a model with Student-t robustness, nonlinear heteroscedastic structure, and an explicit contamination mixture, matching or outperforming TabPFN, a state-of-the-art black-box method, while remaining fully interpretable. Across four additional experiments, the same mechanism discovers hierarchical partial pooling, varying-slope models with correlated random effects, and a Poisson attack/defense model for soccer. No search algorithm, critic module, or domain-specific instructions are needed. This is, to our knowledge, the first demonstration that a CLI coding agent can autonomously write and iteratively improve Stan code for diverse Bayesian modeling problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables
Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender and race. Causal notions of fairness align with legal requirements, yet many methods assume access to detailed knowledge of the underlying causal graph, which is a demanding assumption in practice. We propose a learning framework that achieves interventional fairness by leveraging a causal graph over \textit{clusters of variables}, which is substantially easier to estimate than a variable-level graph. With possible \textit{adjustment cluster sets} identified from such a cluster causal graph, our framework trains a prediction model by reducing the worst-case discrepancy between interventional distributions across these sets. To this end, we develop a computationally efficient barycenter kernel maximum mean discrepancy (MMD) that scales favorably with the number of sensitive attribute values. Extensive experiments show that our framework strikes a better balance between fairness and accuracy than existing approaches, highlighting its effectiveness under limited causal graph knowledge.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Law (0.68)
- Education (0.67)
- Information Technology (0.46)
- Information Technology > Data Science > Data Mining (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- (2 more...)
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting).
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Europe > Switzerland > Zürich > Zürich (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- South America > Brazil > São Paulo (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Overview (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- (7 more...)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Austria (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach.
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Europe > Switzerland (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.73)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Oceania > Australia > New South Wales (0.04)
- (13 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
- Banking & Finance (0.92)
- Transportation (0.92)
- Health & Medicine > Therapeutic Area > Endocrinology (0.68)